{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SARSA On-Policy Control\n",
    "\n",
    "We will be using **TD control method of SARSA** on Cliff World environment as given below:  \n",
    "\n",
    "![GridWorld](./images/cliffworld.png \"Cliff World\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SARSA On-Policy control\n",
    "\n",
    "SARSA control is carried out by sampling step by step and updating Q values at each step. This leads to continuous improvement of ε-greedy policy. The Update equation is given below:\n",
    "\n",
    "$$ Q(S,A) \\leftarrow Q(S,A) + \\alpha * [ R + \\gamma * Q(S’,A’) – Q(S,A)] $$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial imports and enviroment setup\n",
    "import gym\n",
    "import gym.envs.toy_text\n",
    "import sys\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# SARSA Learning agent class\n",
    "from collections import defaultdict\n",
    "\n",
    "\n",
    "class SARSAAgent:\n",
    "    def __init__(self, alpha, epsilon, gamma, get_possible_actions):\n",
    "        self.get_possible_actions = get_possible_actions\n",
    "        self.alpha = alpha\n",
    "        self.epsilon = epsilon\n",
    "        self.gamma = gamma\n",
    "        self._Q = defaultdict(lambda: defaultdict(lambda: 0))\n",
    "\n",
    "    def get_Q(self, state, action):\n",
    "        return self._Q[state][action]\n",
    "\n",
    "    def set_Q(self, state, action, value):\n",
    "        self._Q[state][action] = value\n",
    "\n",
    "    # carryout SARSA updated based on the sample (S, A, R, S', A')\n",
    "    def update(self, state, action, reward, next_state, next_action, done):\n",
    "        if not done:\n",
    "            td_error = reward + \\\n",
    "                       self.gamma * self.get_Q(next_state, next_action) - \\\n",
    "                       self.get_Q(state, action)\n",
    "        else:\n",
    "            td_error = reward - self.get_Q(state, action)\n",
    "\n",
    "        new_value = self.get_Q(state, action) + self.alpha * td_error\n",
    "        self.set_Q(state, action, new_value)\n",
    "\n",
    "    # get argmax for q(s,a)\n",
    "    def max_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "        best_action = []\n",
    "        best_q_value = float(\"-inf\")\n",
    "\n",
    "        for action in actions:\n",
    "            q_s_a = self.get_Q(state, action)\n",
    "            if q_s_a > best_q_value:\n",
    "                best_action = [action]\n",
    "                best_q_value = q_s_a\n",
    "            elif q_s_a == best_q_value:\n",
    "                best_action.append(action)\n",
    "        return np.random.choice(np.array(best_action))\n",
    "\n",
    "    # choose action as per epsilon-greedy policy\n",
    "    def get_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "\n",
    "        if len(actions) == 0:\n",
    "            return None\n",
    "\n",
    "        if np.random.random() < self.epsilon:\n",
    "            a = np.random.choice(actions)\n",
    "            return a\n",
    "        else:\n",
    "            a = self.max_action(state)\n",
    "            return a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# training algorithm\n",
    "def train_agent(env, agent, episode_cnt=10000, tmax=10000, anneal_eps=True):\n",
    "    episode_rewards = []\n",
    "    for i in range(episode_cnt):\n",
    "        G = 0\n",
    "        state = env.reset()\n",
    "        action = agent.get_action(state)\n",
    "        for t in range(tmax):\n",
    "            next_state, reward, done, _ = env.step(action)\n",
    "            next_action = agent.get_action(next_state)\n",
    "            agent.update(state, action, reward, next_state, next_action, done)\n",
    "            G += reward\n",
    "            if done:\n",
    "                episode_rewards.append(G)\n",
    "                # to reduce the exploration probability epsilon over the\n",
    "                # training period.\n",
    "                if anneal_eps:\n",
    "                    agent.epsilon = agent.epsilon * 0.99\n",
    "                break\n",
    "            state = next_state\n",
    "            action = next_action\n",
    "    return np.array(episode_rewards)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot rewards\n",
    "def plot_rewards(env_name, rewards, label):\n",
    "    plt.title(\"env={}, Mean reward = {:.1f}\".format(env_name,\n",
    "                                                    np.mean(rewards[-20:])))\n",
    "    plt.plot(rewards, label=label)\n",
    "    plt.grid()\n",
    "    plt.legend()\n",
    "    plt.ylim(-300, 0)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# helper function to print policy under Cliff world\n",
    "def print_policy(env, agent):\n",
    "    nR, nC = env._cliff.shape\n",
    "\n",
    "    actions = '^>v<'\n",
    "\n",
    "    for y in range(nR):\n",
    "        for x in range(nC):\n",
    "            if env._cliff[y, x]:\n",
    "                print(\" C \", end='')\n",
    "            elif (y * nC + x) == env.start_state_index:\n",
    "                print(\" X \", end='')\n",
    "            elif (y * nC + x) == nR * nC - 1:\n",
    "                print(\" T \", end='')\n",
    "            else:\n",
    "                print(\" %s \" %\n",
    "                      actions[agent.max_action(y * nC + x)], end='')\n",
    "        print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    This is a simple implementation of the Gridworld Cliff\n",
      "    reinforcement learning task.\n",
      "\n",
      "    Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction\n",
      "    by Sutton and Barto:\n",
      "    http://incompleteideas.net/book/bookdraft2018jan1.pdf\n",
      "\n",
      "    With inspiration from:\n",
      "    https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py\n",
      "\n",
      "    The board is a 4x12 matrix, with (using Numpy matrix indexing):\n",
      "        [3, 0] as the start at bottom-left\n",
      "        [3, 11] as the goal at bottom-right\n",
      "        [3, 1..10] as the cliff at bottom-center\n",
      "\n",
      "    Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward\n",
      "    and a reset to the start. An episode terminates when the agent reaches the goal.\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "# create cliff world environment\n",
    "env = gym.envs.toy_text.CliffWalkingEnv()\n",
    "print(env.__doc__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a SARSA agent\n",
    "agent = SARSAAgent(\n",
    "            alpha=0.25,\n",
    "            epsilon=0.2,\n",
    "            gamma=0.99,\n",
    "            get_possible_actions=lambda s : range(env.nA)\n",
    "        )\n",
    "\n",
    "#train agent and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt = 5000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgb0lEQVR4nO3deZhdVZnv8e+PSkhCBqakQ0iFJIzdBBM0BTI0UCgtKAjYtgIOoHTfNIq2t8WBNH0R1NjYYuvl0sKNXkRsBtNEDIIgASkCQkCCIRBCIJBIqpMrEAykyECGt//Yq8JJzZVTp07V3r/P8+yn9l5r7b3Xe3Lynn3WHo4iAjMzK5Zdqt0BMzPrfU7+ZmYF5ORvZlZATv5mZgXk5G9mVkBO/mZmBeTkbx2S9ClJD5UsN0naP80PkfRLSa9L+s9U9k1Jr0r6/1Xo6wpJJ7VTVy+psbf7lHeSJkgKSQOq3RfrHid/Q9LJkuZJWifpFUkPSDq9rbYRMSwiXkyLfwOMBvaOiI9IGgdcBBwaEfu0sZ+lkj5asnxsShwty5qqmUzSB0VI+nmL8impvKFKXSssSSdKuj8daKxoUbdfes+UTiHpona2JUnflrQmTf8qSb0SSB/i5F9wkv4G+E/gBqCWLJlfCnywC6uPB56LiC0ly2si4uV22s8DTihZPh54to2yh0u22ZUYKvFB8QpwjKS9S8rOA56rwL66TVJNFfZZzaP7N4HrgC+3rIiIl9JBybCIGAa8A9gGzG5nW9OAM4EpwGTgNODvK9HpvszJv8ok7StpdjriXi7pH0rqLpM0S9IN6ah8saS6VHexpFtbbOt/S7qqG/sW8G/ANyLiRxHxekRsi4gHIuJ/tLNOSDpQ0uVkHxJnpSOtvwfmAvum5evbWH0eWXJvdhzw7TbK5qV9nZ5iXiupQdJflPRjhaSvSloEvNkyMaUhqesl/UnSM8ARXX1dkreAXwBnp+3VAB8Fbmyxnz+XNFfSa218szlV0u8lvSFppaTLSuqah0vOk/RSGiq7pL3OpFiukfQrSW8CJ7b33pE0WNIGSSPT8j9L2iJpRFr+pqTvd6OPfyvpJeA3kmokXZn6+yJwajdf150SEY9FxE+BFzttDOcC8yJiRTv15wHfjYjGiPgv4LvAp3qko/1JRHiq0kT24buALInuCuxP9uY+OdVfBmwEPgDUAP8CzE9144H1wIi0XAOsBo5Kyz8A1rYzLUpt/hwIYGIHffwU8FDJcgAHlvTvP0rq6oHGDra1H9kR2V4p9peBIcDKkrK1ZB8GB5Md7f0VMBD4CrAM2DVtawWwEBgHDCkpOynNXwE8mLY7Dni6o7616Gc90AgcAzyayj4A/Br4O6AhlQ1Nff80MAB4F/AqMKlkO+9IcU0G/gicmeompNfyh+k1mAJsAv6inT5dD7wOHJu2txsdv3fmAR9O8/cALwDvL6n7UDf6eEOKdQhwAdm3tXHptb0/tRnQTr/voP334R078X/mJGBFJ21eAD7VQf3rwLtLluuAddXOB709+ci/uo4ARkXE1yPircjG0n9IOtpMHoqIX0XEVuCnZEmCiPgD8ATZ11eA9wDrI2J+qv9sROzRzjQ5rdM8pLG6olEmEfES8BLZ0f0U4PmI2AD8tqRsMPAocBZwZ0TMjYjNwJVkyeeYkk1eFREr0zZa+igwIyJei4iVQJe/EZX092FgL0mHkB1N3tCiyWlkiejHEbElIp4gG2r4m7R+Q0Q8Fdm3qUXAzew4xAVweURsiIgngSfTa9CeORHx24jYRpawO3rvPACckL4RTU7xnyBpMNn77sFu9PGyiHgzvc4fBb6fXvfXyA5IOnoNT+vgfXhaR+vuDEnHkQ1d3tpBs2FkHwDNXgeGFW3c38m/usaTDZOsbZ6AfyJ78zYrvWpmPTC4ZIjjJuCcNP+xtNwda9LfMd1crxzNQz/HkxIQ8FBJ2aMRsQnYF/hD80op4a0ExpZsa2UH+9m3Rf0f2mvYiZ8CnwNOBG5rUTceeHeLf7+PA/sASHp3Okn5iqTXyY6aR7bYRst/32Ed9KU0ns7eOw+QHdW/C3iKbEjuBOAoYFlEvNqNPpbut6de13ZJ+ie9feL22m6ufh4wOyKaOmjTBIwoWR4BNEX6GlAUTv7VtRJY3uJoaHhEfKCL6/8nUC+pFvgQJclf0rVqfQVE87Q4NVua+vDhngyqE83J/zjeTv4PlpTNS2WryBIcsP38xDjgv0q21dF/1tWpfbP9drK/PwU+C/wqIta3qFsJPNDi329YRHwm1d8E3A6Mi4jdgWuBco4uS+Pt7L3zMHAI2fvigYh4huw1OJXsg6FZV/pYut9uva6S7urgfXhXm0FGfCvePoF7QUfbb7GvIcBHgJ900nQxO37DmpLKCsXJv7oeA95IJy6HpJNph0nq0snJiHgFaAB+TJYIlpTUXVDyH6jlNCm1CeCLwP+S9GlJIyTtIukvJc3s8Wgz84B3kh2F/jaVPQVMJDu6bk7+s4BTJb1X0kCyS0g3kSW1rpgFTJe0Z/pw/HxpZTqBen1nG4mI5amvbZ2MvQM4WNInJQ1M0xElJ6aHA69FxEZJR5J9O+spHb530gfVAuBC3k72D5Nd1VKa/Lvbx1nAP0iqlbQncHFHjSPi/R28D9/f1WDT+3Iw2fkfpZPau7Zo9iGycwn3d7K5G4AvShoraV+y99b1Xe1LXjj5V1Eax/8gcDiwnOxk4Y+A3buxmZvIToJ1d8inuQ+3ko2vn092tP1H4JvAnJ3ZXhf29xzZid7VEbE2lW0jS2YjSMk9IpYCnwD+D9nr8kHggxHxVhd3dTnZkMRyshOeP21RP463P3w66/NDEbGqjfJ1wPvIxtlXkQ3hfBsYlJp8Fvi6pHVkJ2ZndbHvXelTV947D5Aly8dKlofz9gfszvTxh2Qnvp8kO+f0846b95jjgQ3Ar8i+bWwg+3ctdR5wQ8vhG0nHSSodBvq/wC/JDjqeBu5MZYWigg1zmZGOGJ8EJqeTyWaF4+RvZlZAVRv2kXSKsptilknqcNzQzMx6VlWO/JXdLfkc2Q08jcDvgHPSFQlmZlZh1TryP5LsWuMX0wm8W4AzqtQXM7PCqdaDmsay440ijcC7WzaSNI3sIUwMGTJk6rhx41o26ZJt27axyy7FurDJMRdD0WIuWrxQfszPPffcqxExqmV5tZJ/Wze6tBp/ioiZwEyAurq6ePzxx3dqZw0NDdTX1+/Uuv2VYy6GosVctHih/JgltXkXdrU+QhvZ8S7BWrLrpM3MrBdUK/n/DjhI0sR0zfXZZLeYm5lZL6jKsE9EbJH0ObI7BWuA6yKicM/WMDOrlqr9Mk9E/IrsVm0zM+tlxTptbmZmgJO/mVkhOfmbmRWQk7+ZWQE5+ZuZFVDVrvbpSzZu3sqGt7ZSUyNiGwzZtYYg2PjWNgAG1IjNW7ehdGNydPjrgX1D01vB2vVd/d2TfHDM+Ve0eCGLOSLo6d+XL1TyX9S4ltOv7tKPN+XDb+ZWuwe9zzHnX9HiBU46cRuDB9b06DYLlfwfWPpKRbb7tQ8eSgR8/Y5nti+X+vbdz7Jx87Y26ypl2fPLOPCgA3tlX32FY86/osULWcwDdunZo34oUPJf/9YWvjv3uU7b7T9yKC+++maXt/vFvzqYTx87EYD7nv0jZx2xH6dP2XeHNuccuR8nf38eV35kCkdM2Kt7Hd9JDZv/QH3qV1E45vwrWryQxTygpudPzxYm+R966a87rL/vohM4YNQw3ti4mcmXvf270CuuOJUJF9/Z5jorrjh1h+Ub/+6oNtsNHljDA18+sZs9NjOrHF/tk+yaPllHDB5I7Z5DqtwbM7PKKsyRf3c89NX38C93LeG4A1v9/oGZWS44+Sctr6Ka/v6/qE5HzMx6gYd9zMwKyMk/6ekbKMzM+jInfzOzAnLyT3zcb2ZFUojkv2lr338Wj5lZbypE8v+PZ4r1ICgzs84UIvm/umFbp218vtfMiqQQyd+J3cxsR8VI/l1q408IMyuOQiT/nvKT84/k2k9MrXY3zMzKVojHO3TlBq6uDA2dcLCf9WNm+VCII/+NW3ypp5lZqUIk/2Vru3C1Ty/0w8ysr8h98t+ytfPEb2ZWNBVL/pIuk/Rfkham6QMlddMlLZO0VNLJleoDwOJVb1Ry82Zm/VKlT/h+LyKuLC2QdChwNjAJ2Be4V9LBEbG1wn3pmMd9zKxAqjHscwZwS0RsiojlwDLgyCr0w8yssCqd/D8naZGk6yTtmcrGAitL2jSmsoqY9fjKzhvhm7zMrFjKGvaRdC+wTxtVlwDXAN8AIv39LnA+bQ+wtHktpqRpwDSA0aNH09DQ0O0+3rFwfZfaPfzww+w+qOMPgJb735n+9JampqY+3b9KcMz5V7R4oXIxl5X8I+KkrrST9EPgjrTYCIwrqa4FVrWz/ZnATIC6urqor6/vdh8H/fZe2LSp03bHHnsMI4cNarvy7jsB2L7/lst9UENDQ5/uXyU45vwrWrxQuZgrebXPmJLFDwFPp/nbgbMlDZI0ETgIeKxy/ajUls3M+q9KXu3zr5IOJxvSWQH8PUBELJY0C3gG2AJcWMkrfbo6lu/PCDMrkool/4j4ZAd1M4AZldp3KR/5m5m1lvs7fJ37zcxay3/y7+Khf1fbmZnlQe6Tv5mZtebkn/i438yKxMnfzKyAcp/8uzqU7yF/MyuS3Cf/XZzVzcxayX3y7/KRv0f9zaxAcp/8zcystdwn//Bvt5uZtZL75N9lHvUxswJx8jczK6DcJ/9o+3diWvFFQWZWJLlP/mZm1lruk7+f529m1lruk7+ZmbXm5J/4kc5mViRO/mZmBZT75N/Vq33MzIok98m/qzzoY2ZF4uRvZlZATv6Jz/eaWZHkPvn7wW5mZq3lPvk3/mlDl9r5ef5mViS5T/5mZtaak7+ZWQE5+Sc+4WtmReLkb2ZWQGUlf0kfkbRY0jZJdS3qpktaJmmppJNLyqdKeirVXSU/VMfMrNeVe+T/NPDXwLzSQkmHAmcDk4BTgB9IqknV1wDTgIPSdEqZfTAzs24qK/lHxJKIWNpG1RnALRGxKSKWA8uAIyWNAUZExCMREcANwJnl9KGn+PuHmRXJgAptdywwv2S5MZVtTvMty9skaRrZtwRGjx5NQ0NDj3e02bx58xi4S8efAC33X8n+lKupqalP968SHHP+FS1eqFzMnSZ/SfcC+7RRdUlEzGlvtTbKooPyNkXETGAmQF1dXdTX13fc2bbcfWeXmp1w/AnsOqCdL0JpG9v333K5D2poaOjT/asEx5x/RYsXKhdzp8k/Ik7aie02AuNKlmuBVam8to1yMzPrRZW61PN24GxJgyRNJDux+1hErAbWSToqXeVzLtDetwczM6uQci/1/JCkRuBo4E5JvwaIiMXALOAZ4G7gwojYmlb7DPAjspPALwB3ldOHnuITvmZWJGWd8I2I24Db2qmbAcxoo/xx4LBy9mtmZuXxHb6JD/zNrEic/M3MCsjJ38ysgJz8zcwKyMk/8fPlzKxInPzNzAqoUs/26Xc6eazPDibtO4KzjxjXeUMzsz6q0Ml/j90Gsnb9Zh78yondGva58x+Oq2CvzMwqr9DJ/9F/ei9LVq9j3F67ddju/i/VM3RQTYdtzMz6k0In/0EDajh83B6dtps4cmjlO2Nm1ot8wtfMrICc/M3MCsjJ38ysgJz8zcwKyMnfzKyAnPzNzArIyd/MrICc/M3MCqiwyf89f/5n1e6CmVnVFDb5HzFhr2p3wcysagqb/M3MiszJ38ysgJz8zcwKyMnfzKyAnPzNzAqocMm/bvye1e6CmVnVFS75T53g5G9mVlbyl/QRSYslbZNUV1I+QdIGSQvTdG1J3VRJT0laJukqdefHc83MrEeUe+T/NPDXwLw26l6IiMPTdEFJ+TXANOCgNJ1SZh/MzKybykr+EbEkIpZ2tb2kMcCIiHgkIgK4ATiznD6YmVn3VfIH3CdK+j3wBvDPEfEgMBZoLGnTmMraJGka2bcERo8eTUNDQ9mdWvnSSgBefPEFGlhZ9vb6qqamph55vfoTx5x/RYsXKhdzp8lf0r3APm1UXRIRc9pZbTWwX0SskTQV+IWkSUBb4/vR3r4jYiYwE6Curi7q6+s7625rd9+5w+K4/cbB8hfZf/8DqK8/oPvb6ycaGhrYqderH3PM+Ve0eKFyMXea/CPipO5uNCI2AZvS/AJJLwAHkx3p15Y0rQVWdXf7ZmZWnopc6ilplKSaNL8/2YndFyNiNbBO0lHpKp9zgfa+PZiZWYWUe6nnhyQ1AkcDd0r6dao6Hlgk6UngVuCCiHgt1X0G+BGwDHgBuKucPpiZWfeVdcI3Im4DbmujfDYwu511HgcOK2e/ZmZWnsLd4WtmZk7+ZmaF5ORvZlZATv5mZgXk5G9mVkBO/mZmBeTkb2ZWQIVL/p9493jG7jGEM9+5b7W7YmZWNZV8qmefNG6v3fjtxe+pdjfMzKqqcEf+Zmbm5G9mVkhO/mZmBeTkb2ZWQE7+ZmYF5ORvZlZATv5mZgXk5G9mVkBO/mZmBeTkb2ZWQE7+ZmYF5ORvZlZATv5mZgXk5G9mVkBO/mZmBeTkb2ZWQE7+ZmYF5ORvZlZAZSV/Sd+R9KykRZJuk7RHSd10ScskLZV0ckn5VElPpbqrJKmcPpiZWfeVe+Q/FzgsIiYDzwHTASQdCpwNTAJOAX4gqSatcw0wDTgoTaeU2QczM+umspJ/RNwTEVvS4nygNs2fAdwSEZsiYjmwDDhS0hhgREQ8EhEB3ACcWU4fzMys+wb04LbOB36W5seSfRg0a0xlm9N8y/I2SZpG9i2B0aNH09DQUHYne2Ib/UFTU1NhYm3mmPOvaPFC5WLuNPlLuhfYp42qSyJiTmpzCbAFuLF5tTbaRwflbYqImcBMgLq6uqivr++su63dfecOizu1jX6ooaGhMLE2c8z5V7R4oXIxd5r8I+KkjuolnQecBrw3DeVAdkQ/rqRZLbAqlde2UW5mZr2o3Kt9TgG+CpweEetLqm4HzpY0SNJEshO7j0XEamCdpKPSVT7nAnPK6UN37Lv74N7alZlZn1bumP/VwCBgbrpic35EXBARiyXNAp4hGw66MCK2pnU+A1wPDAHuSlOveHj6e3trV2ZmfVpZyT8iDuygbgYwo43yx4HDytmvmZmVx3f4mpkVkJO/mVkBOfmbmRWQk7+ZWQE5+ZuZFZCTv5lZATn5m5kVkJO/mVkBOfmbmRWQk7+ZWQE5+ZuZFVDuk3/d+D2r3QUzsz4n98l//N5Dq90FM7M+J/fJP9r/oTAzs8LKffJ37jczay3/yd/MzFpx8jczK6DcJ3+P+piZtZb75G9mZq05+ZuZFZCTv5lZAeU++avaHTAz64Nyn/x9wtfMrLXcJ38zM2vNyd/MrIByn/wjPPBjZtZS/pN/tTtgZtYHlZX8JX1H0rOSFkm6TdIeqXyCpA2SFqbp2pJ1pkp6StIySVdJ8gU5Zma9rNwj/7nAYRExGXgOmF5S90JEHJ6mC0rKrwGmAQel6ZQy+2BmZt1UVvKPiHsiYktanA/UdtRe0hhgREQ8Etlg/A3AmeX0wczMuq8nx/zPB+4qWZ4o6feSHpB0XCobCzSWtGlMZWZm1osGdNZA0r3APm1UXRIRc1KbS4AtwI2pbjWwX0SskTQV+IWkSbR9w22752QlTSMbImL06NE0NDR01t1W/vjHjdvnd2b9/qqpqalQ8YJjLoKixQuVi7nT5B8RJ3VUL+k84DTgvWkoh4jYBGxK8wskvQAcTHakXzo0VAus6mDfM4GZAHV1dVFfX99Zd1v5+erfw+psFzuzfn/V0NBQqHjBMRdB0eKFysVc7tU+pwBfBU6PiPUl5aMk1aT5/clO7L4YEauBdZKOSlf5nAvMKacPZmbWfZ0e+XfiamAQMDddsTk/XdlzPPB1SVuArcAFEfFaWuczwPXAELJzBHe13KiZmVVWWck/Ig5sp3w2MLuduseBw8rZb3f4Ji8zs9byf4evH+9gZtZK7pO/byA2M2st98nfzMxac/I3Myug3Cd/j/mbmbWW++RvZmatOfmbmRWQk7+ZWQGVe4dvn+cRf7P82bx5M42NjWzcuLHzxv3c7rvvzpIlSzptN3jwYGpraxk4cGCXtpv75G9m+dPY2Mjw4cOZMGFC7u/lWbduHcOHD++wTUSwZs0aGhsbmThxYpe2m/9hHx/6m+XOxo0b2XvvvXOf+LtKEnvvvXe3vgnlP/mbWS458e+ou6+Hk7+ZWQHlPvmHx33MrEJmzJjBpEmTmDx5MocffjiPPvooAFu2bGHkyJFMnz59h/b19fUccsghTJkyhSOOOIKFCxdur7vuuut4xzveweTJkznssMOYM2fHnzqZMmUK55xzTo/13Sd8zcx2wiOPPMIdd9zBE088waBBg3j11Vd56623ALjnnns45JBDmDVrFt/61rd2GJK58cYbqaur48c//jFf/vKXmTt3Lo2NjcyYMYMnnniC3XffnaamJl555ZXt6yxZsoRt27Yxb9483nzzTYYOHVp2/538zaxfu/yXi3lm1Rs9us1D9x3B1z44qcM2q1evZuTIkQwaNAiAkSNHbq+7+eab+cIXvsA111zD/PnzOfroo1utf/TRR/Od73wHgJdffpnhw4czbNgwAIYNG7Z9HuCmm27ik5/8JEuWLOH222/vkW8AuR/2UZu/GW9mVp73ve99rFy5koMPPpjPfvazPPDAAwBs2LCB++67j9NOO41zzjmHm2++uc317777bs4880wgG9IZPXo0EydO5NOf/jS//OUvd2j7s5/9jLPOOqvD7XVX7o/8PeZvlm+dHaFXyrBhw1iwYAEPPvgg999/P2eddRZXXHEFQ4cO5cQTT2S33Xbjwx/+MN/4xjf43ve+R01NDQAf//jHefPNN9m6dStPPPEEADU1Ndx999387ne/47777uMf//EfWbBgAZdddhkLFixg1KhRjB8/ntraWs4//3z+9Kc/seeee5bV/9wf+ZuZVUpNTQ319fVcfvnlXH311cyePZubb76Ze++9lwkTJjB16lTWrFnD/fffv32dG2+8keXLl/Oxj32MCy+8cHu5JI488kimT5/OLbfcwuzZ2S/h3nrrrTz77LNMmDCBAw44gDfeeGN7XTmc/M3MdsLSpUt5/vnnty8vXLiQUaNG8dBDD/HSSy+xYsUKVqxYwb//+7+3GqoZOHAg3/zmN5k/fz5Llixh1apV278FNG9r/PjxbNu2jV/84hcsWrRo+/bmzJnTI0M/+R/28aiPmVVAU1MTn//851m7di0DBgzgwAMP5JhjjmH9+vXbTwIDnHHGGXzlK19h06ZNO6w/ZMgQLrroIq688kouvfRSvvSlL7Fq1SoGDx7MqFGjuPbaa5k3bx5jxoxh7Nix29c7/vjjeeaZZ1i9ejVjxozZ6f7nPvmbmVXC1KlTefjhhzttt9dee22/bLOhoWGHuosuumj7/G9+85tW6x5wwAGtymtqali9evVO9HhHuR/22bB5a7W7YGbW5+Q++TcsfaXzRmZmBZP75G9m+eTf595Rd18PJ38z63cGDx7MmjVr/AGQND/Pf/DgwV1exyd8zazfqa2tpbGxcYfn3+TVxo0bu5TUm3/Jq6uc/M2s3xk4cGCXf7Gqv2toaOCd73xnj2+3rGEfSd+QtEjSQkn3SNq3pG66pGWSlko6uaR8qqSnUt1V8i8ymJn1unLH/L8TEZMj4nDgDuBSAEmHAmcDk4BTgB9IqknrXANMAw5K0yll9sHMzLqprOQfEaXPUR3K27+YewZwS0RsiojlwDLgSEljgBER8UhkZ2puAM4spw9mZtZ9ZY/5S5oBnAu8DpyYiscC80uaNaayzWm+ZXl7255G9i0BoEnS0p3s5kjgVX17J9fun0YCr1a7E73MMedf0eKF8mMe31Zhp8lf0r3APm1UXRIRcyLiEuASSdOBzwFfgzYfoh8dlLcpImYCMzvrY2ckPR4RdeVupz9xzMVQtJiLFi9ULuZOk39EnNTFbd0E3EmW/BuBcSV1tcCqVF7bRrmZmfWicq/2Oahk8XTg2TR/O3C2pEGSJpKd2H0sIlYD6yQdla7yORfY8VeKzcys4sod879C0iHANuAPwAUAEbFY0izgGWALcGFEND9h7TPA9cAQ4K40VVrZQ0f9kGMuhqLFXLR4oUIxy7dHm5kVj5/tY2ZWQE7+ZmYFlOvkL+mU9HiJZZIurnZ/yiHpOkkvS3q6pGwvSXMlPZ/+7llS1+8fryFpnKT7JS2RtFjSF1J5buOWNFjSY5KeTDFfnspzGzOApBpJv5d0R1rOe7wrUl8XSno8lfVuzBGRywmoAV4A9gd2BZ4EDq12v8qI53jgXcDTJWX/Clyc5i8Gvp3mD03xDgImptehJtU9BhxNds/FXcD7qx1bBzGPAd6V5ocDz6XYcht36t+wND8QeBQ4Ks8xp75+kexy8TsK8t5eAYxsUdarMef5yP9IYFlEvBgRbwG3kD12ol+KiHnAay2KzwB+kuZ/wtuPysjF4zUiYnVEPJHm1wFLyO4Iz23ckWlKiwPTFOQ4Zkm1wKnAj0qKcxtvB3o15jwn/7HAypLlDh8l0U+NjuzeCdLfP0vl7cU+lm48XqMvkTQBeCfZkXCu405DIAuBl4G5EZH3mL8PfIXskvFmeY4Xsg/0eyQtSI+xgV6OOc/P8+/WoyRypkcer9FXSBoGzAb+Z0S80cGwZi7ijuyemMMl7QHcJumwDpr365glnQa8HBELJNV3ZZU2yvpNvCWOjYhVkv4MmCvp2Q7aViTmPB/5t/eIiTz5Y/rqR/r7cirPzeM1JA0kS/w3RsTPU3Hu4waIiLVAA9ljz/Ma87HA6ZJWkA3NvkfSf5DfeAGIiFXp78vAbWTD1L0ac56T/++AgyRNlLQr2e8L3F7lPvW024Hz0vx5vP2ojFw8XiP18f8BSyLi30qqchu3pFHpiB9JQ4CTyB6bksuYI2J6RNRGxASy/6O/iYhPkNN4ASQNlTS8eR54H/A0vR1ztc96V3ICPkB2hcgLZE8hrXqfyojlZmA1bz8W+2+BvYH7gOfT371K2l+S4l5KyRUAQF16o70AXE26y7svTsBfkn2NXQQsTNMH8hw3MBn4fYr5aeDSVJ7bmEv6W8/bV/vkNl6yKxCfTNPi5tzU2zH78Q5mZgWU52EfMzNrh5O/mVkBOfmbmRWQk7+ZWQE5+ZuZFZCTv5lZATn5m5kV0H8DbOJ/XYs51noAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot rewards\n",
    "plot_rewards(\"Cliff World\",rewards, 'SARSA')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " >  >  >  >  >  >  >  >  >  >  >  v \n",
      " ^  >  >  ^  <  ^  ^  >  >  >  >  v \n",
      " ^  >  >  ^  ^  >  <  ^  ^  >  >  v \n",
      " X  C  C  C  C  C  C  C  C  C  C  T \n"
     ]
    }
   ],
   "source": [
    "# print policy\n",
    "print_policy(env, agent)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SARSA On-Policy for \"Taxi\" environment "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAitUlEQVR4nO3deZhU5Zn38e+PBgFZlaUFms19AAFjy0hMtBOJEqPRmMmok8RMjC/R12wTJ4nEuRyzkDGTzGTimGgYJ1HfcYmJC0ajcS3RqFEhxgVEUYm2kFERhFZBkfv9o05D0VT1UtXV1VXn97muuvrUc7b7qe6+z3Oe85xTigjMzCxd+lQ6ADMz63lO/mZmKeTkb2aWQk7+ZmYp5ORvZpZCTv5mZink5G9VR9Itkj5T6ThqjaRLJX230nFYz3Dyt7KS9KSkluT1rqRNOe+/Wcw2I+LDEXFZJ/efkRSSZrQpvyEpbyomBuscSRfn/L5bJG2WtLGd5UPSGznLX9KT8aZJ30oHYLUtIqa2TkvKAP8TET39D/00cApwVhLHCOAQ4JUejmMnkvpGxJYK7LcuIt4t934i4nTg9Jz9Xgps7WC1GRGxspxxmVv+qSBprKRrJb0i6XlJX8qZd56kayRdLmlj0lJvTOadLenXbbb1Y0kXdENMe0m6S9JaSa9KukLS8Jx5r0l6T078r7a20pPW/Gld2N0VwImS6pL3JwPXA2/nxNMnqe+zSUzXSNo9Z/6vJP1F0uuSFkvKPahdKuknkm5OPsM/SNqrQL0nJa3bz0l6AbgrKT9V0nJJ6yT9TtLEpPxbkv4zme6XtIr/NXk/MDmT2q2TMV4k6beS3gA+IOlASUuTmH8JDOjCZ9plkgYBHwc6ddZm5eXkX+Mk9QF+A/wJGAccAXxF0lE5i30UuBoYDtwIXJiUXwUcLWlosq064G+BK5P3P5W0vsDrsY5CA/4FGAv8FTAeOA8gIp4FvgFcIWlX4BfApRGRKfJjWA0sA45M3p8CXN5mmS8BxwOHJzGtA36SM/8WYB9gNLCU7AEl18nAt4DdgJXAgg5iOpxsvY+SdDzwTeAEYBRwL9nPHuAeoCmZPhj4S7IuwGxgRUSs62SMf5fENQR4CLgB+H/A7sCvyCbmvCS9r53f9XpJ7+ugviTbfwVY3MFyi5OD2HWSJnViu1aMiPCrhl/AXwMvtCmbD/wimT4PuCNn3hTgrZz39wGnJNMfAp4tIZYMcFqBeccDf2xTdiPwOPAY0L8z2ym0T+BTZBPqfsDTybxmoCmZXg4ckbPeGOAdoG+ebQ4HAhiWvL8UuCRn/tHAUwXimZSsu2dO2S3A53Le9wHeBCYCA4FNwAjgbLIHiWZgMNmDzQUF9pMvxstz5h9G9qConLL7ge+W8W/xTuC8DpY5DNglif9C4Il8vwO/Sn+55V/7JgJjc1tpZBNIfc4yf8mZfhMYIKn1etCVZFu1kG05XtkdQUkaLelqSS9J2gD8DzCyzWL/BUwD/jMiNpe4y+uADwJfJNvabWsicH3OZ7QceBeol1Qn6fykS2gDsCpZJzfetp/h4A7iebHNvn+cs+/XyJ4ZjYuIt4BHyLb2DyN7JnA/cGhSdg9kz8o6EWPuPscCL0WScRN/7iDmokkan8Tb9oxrBxGxOCLejoj1wJeByWTPkKybOfnXvheB5yNieM5rSEQc3cn1fwU0SWoAPkZO8tfOIzlyX092sN1/IdsynR4RQ8m2zJWz7cHAfwD/DZyX2/9ejIh4k2wL+wzyJ/8XgQ+3+ZwGRMRLZA96xwFzgGFkW+/kxltMSG32/fk2+x4YEfcn8+8he+A6EHg4eX8UMIvtXSidiTF3n2uAcZJy508oFKyk97fzu26R9P4O6nsKcH9EPNfBcm0FpX3OVoCTf+17CNgg6RvJBcI6SdMkHdyZlSPiFbJdJ78gexBZnjPv9IgYXOA1tdA2E0OAFmC9pHHA19rM/zGwJCJOA24GLs63kZwLqJM6UZ1vAodHxKo88y4GFuRcaB0l6bicWDcDa4Fdge91Yl9dcTEwv/UCraRhkj6RM/8esslzWUS8zfaurOeT308xMT4AbAG+JKmvpBPIHkzyioh72/ldD46IezvY3ylku54KkjRV0szkb3Qw8G/AS2TPwqybOfnXuMgO5zsWmAk8D7wKXEK2ddhZV5JtUXZLl0/iW8B7gNfJJvfrWmckSXcu24cIfhV4j6RP5tnOeLLdFS91tMOIWB0R9xWY/WOy1xhuU3Yc+oNkr5dAtquidR/LknndJiKuB74PXJ102TwBfDhnkfvJ9v23tvKXkb0OkHvhtEsxJgeRE4C/J3tx+0RyfgfdSdJsoIHsWWTbebdo+/0e9cAvgQ3Ac2TPXo6JiHfKEVfaaccuP7PqIumfgFci4meVjsWsmjj5m5mlUMW6fSTNlbRC0kpJZ1cqDjOzNKpIyz+5WehpsuPGm8mOYDg5Ipb1eDBmZilUqZb/LGBlRDyXXHi6muwwNTMz6wGVerDbOHa84aSZ7SMrtpE0D5gHMHDgwIPGjx9f1M62bt1Knz47HueaN25lS8Aeu/bhL29upa9gS5Vc/pg0tA/vBry4Mft8rPFD+vDixq2MGCBe2xT07QPvbN15nVUbtu40PXZQH1a/kZ0e3l+s3xyMG9wHAX2Tjyx3vfasbtnK21thzKA+9K/bcd6mLTCgxL+2dZuC198OdusvhvXfeeh3vt9zrUtbndNWXyi9zk8//fSrETGqbXmlkn++mzZ2Sr0RsRBYCNDY2BiPPPJIUTvLZDI0NTXtUHbo+Xfx0vq3uOSURk67/BHG7z6QF197q6jtd5cB/fqwqW3WzmPF+R8BYNLZNwOw8vyPsHVrIEEESDB5/m93Wqd1+dzp58//CN/+zTImjtiVU2ZPJAL69Nnx15O7XnuO/c/7ePyl17nxC4cyvWF4xxXuon/57XJ+tvg5zv7w/px++M7PTcv3e651aatz2uoLpddZUt47tyuV/JvJjs9u1UD2OSM9pvW+xtMuL+6AUg59+/Sh46fd5teasFXEvZDnHjtl23Qx67flAWRmvV+lzp8eBvaRNFnSLsBJZG+w6RG/fXwNzesq28rPZ/89hhSc9/fvndRzgRSpOw4c7fExxaz7VCT5R/bLK74A/I7srdvXRERHz4LpNv/3iqU7lVW6ywdg5OD+26bHDNvx0eonvGdcT4dTtHInaT/oxax0Ffsmr4j4LfDbDhdMkbqcvvYP7j+aK/7wwrb31dCV0hq9bxy0cnvnnXdobm5m06ZNlQ6l7IYNG8by5R0/3mjAgAE0NDTQr1+/Tm03dV/juHlL2b+5rmi5F1qrMn2Wu9/HLNHc3MyQIUOYNGkSqvG/u40bNzJkSOEuYcg2uNauXUtzczOTJ0/u1HbTNWYK2O+fbq10CAXVtfM3XE0Hg3LF6jMKa7Vp0yZGjBhR84m/syQxYsSILp0JpS7592Y7tPzb5LlqSHw99W/o/3cDnPjb6Orn4eTfi3z1Q/tWOoRuUQXHKbPUc/LvRRp227XSIeR1x1cP47JTC37PxzZuiFnaLFiwgKlTpzJ9+nRmzpzJH/7wBwC2bNnCyJEjmT9//g7LNzU1sd9++zFjxgwOPvhgHn300W3zfv7zn3PAAQcwffp0pk2bxqJFi3ZYd8aMGZx88sl0l9Rd8K0e0c67nrX36CHsPbr9C06Q2+1Tnmh9RmG9yQMPPMBNN93E0qVL6d+/P6+++ipvv/02ALfddhv77bcf11xzDd/73vd26JK54ooraGxs5Be/+AVf+9rXuP3222lubmbBggUsXbqUYcOG0dLSwiuvvLJtneXLl7N161YWL17MG2+8waBBg0qO3y3/KtHdiW9oqQ/ayaP1D7zcSVoe6W+9wJo1axg5ciT9+2fvzxk5ciRjx44F4KqrruLLX/4yEyZM4MEH83+p2uzZs3nppewX0L388ssMGTKEwYMHAzB48OAdRu1ceeWVfPrTn+bII4/kxhu7535Yt/x7qXIn0Du+ejjN67v3xrajDxjDkj+vY9xuA7t1u60O3Xskl9z3PAdOGF6W7Vt1+tZvnmTZ6g3dus0pY4fyz8e2/zXURx55JN/+9rfZd999mTNnDieeeCKHH344b731FnfeeSc/+9nPWL9+PVdddRWzZ8/eaf1bb72V448/Hsh26dTX1zN58mSOOOIITjjhBI499thty/7yl7/k9ttvZ8WKFVx44YXd0v3j5N9L7Zz8dz4arPjuXDZvKe5ZQKOHDmD00AEdL9gFpx46iZNnjWfXXcrzZ/WB/Uez7NtHlW37Zl0xePBglixZwr333svdd9/NiSeeyPnnn8+gQYP4wAc+wK677srHP/5xvvOd7/CjH/2Iurrso24/+clP8sYbb/Duu++ydGn2aQN1dXXceuutPPzww9x55538wz/8A0uWLOG8885jyZIljBo1iokTJ9LQ0MCpp57KunXr2G233UqK3/9FVSLfmUD/vnX071u384wKkVT2xOzEb2111EIvp7q6OpqammhqauKAAw7gsssuo1+/fvz+979n0qRJAKxdu5a7776bOXPmANk+/xkzZnD22Wdz5plnct111wHZ/59Zs2Yxa9YsPvShD/HZz36W8847j1//+tc89dRT27a3YcMGrr32Wk477bSSYneffy8VbVr6k0d2/QLP3xzUsMP7scMG0DixtNaCmWWtWLGCZ555Ztv7Rx99lFGjRnHffffxwgsvsGrVKlatWsVPfvITrrrqqh3W7devH9/97nd58MEHWb58OatXr952FtC6rYkTJ7J161ZuuOEGHnvssW3bW7Ro0U7bK4abURV0yJ678+Bzr7W7zBH7j2bq2KGMyHnoW2dcOncQTU0zOGzfUbz42psA3D//iKJjNbMdtbS08MUvfpH169fTt29f9t57b9773vfy5ptvbrsIDHDcccfx9a9/nc2bN++w/sCBAznrrLP44Q9/yLnnnss//uM/snr1agYMGMCoUaO4+OKLWbx4MWPGjGHcuO0PdjzssMNYtmwZa9asYcyYMUXH7+RfQX/31xMLJv/Wbp4jp9Zz4sETALj1K+/njc1burSPj84YW1KMZpbfQQcdxP3339/hcrvvvvu2YZuZTGaHeWeddda26bvuumundffaa6+dyuvq6lizZk0REe/I3T4V1tGjmnOHNe6/x1AOmrh7uUMysxRw8q+wf//bmZUOwcxSyMm/gtq7Vck3s5q1rxoedtiTuvp5OPn3dr6Z1WwnAwYMYO3atT4AJFqf5z9gQOfv3fEF3x4yZcxQlq3p+C7Eq/7PIQD8eklzuUMyq1oNDQ00Nzfv8PybWrVp06ZOJfXWb/LqLCf/HnLN6bNp+sHdvNrydrvLzd5rBAC/WvIi4Ia/WT79+vXr9DdWVbtMJsOBBx7Y7dt1t08PGdy/L+OKeGSzv7DCzMrByb+C2u2tdFemmZVRqpL/u1urL6O63W9m5ZCq5P++7+98B12PajMywUM9zaxSUpX817ze+W+27wlTxw7tcBl3+ZtZOXi0T4UcM30Me44aXHD+V+bswwuvvcmcKfU9GJWZpYWTfy81ccQgrj3jvZUOw8xqVKq6fXoTD+E0s0oqW/KXdJ6klyQ9mryOzpk3X9JKSSskHVWuGMzMLL9yd/v8KCJ+mFsgaQpwEjAVGAvcIWnfiHi3zLFUnEfwmFlvUYlun+OAqyNic0Q8D6wEZlUgjopyp4+ZVVK5k/8XJD0m6eeSWr88dhzwYs4yzUmZmZn1kJK6fSTdAeyRZ9Y5wEXAd8j2dnwH+DfgVPI3evP2iEiaB8wDqK+v3+kr0DqrpaWl6HW7SyaTYcOGt7a9f/nl/90ppu6MsTfUuae5zrUvbfWF8tW5pOQfEXM6s5yk/wJuSt42A+NzZjcAqwtsfyGwEKCxsTGampqKijOTydDU1AS33lzU+t2hqamJIY/fBxteB7IHs6am5El9SVzF1i+fbXVOEde59qWtvlC+OpdztE/u18p/DHgimb4ROElSf0mTgX2Ah8oVR2/lPn8zq6Ryjvb5V0kzyXbprAI+DxART0q6BlgGbAHOTMNIHzOz3qRsyT8iPt3OvAXAgnLt28zM2uc7fCvEd/iaWSU5+VfIGU17VToEM0sxJ/8eFMmI1hu/cCj71g+pcDRmlmZO/hUgj/Uxswpz8u8G5x4zZaeyj0wfs1PZuOEDARi4iz92M6ssP8+/G7x/n5GdWu4Hn5jBsTNeYe/R7vIxs8pyE7QHDR3Qj2Omj610GGZmTv5mZmnk5G9mlkJO/mZmKeTkb2aWQk7+ZmYp5ORfJv987BRObBzf8YJmZhXg5F+iTx0yIW/56CED+P7fTO/haMzMOsfJv0R/PXlEpUMwM+syJ38zsxRKTfJ/Ye2bZdmuH8tvZtUoNcl/w6Z3Kh2CmVmv4Qe7lag7Hs980xffx4a3fHAys57j5N8LTBs3rNIhmFnKpKbbx8zMtnPyL5Ev+JpZNXLyL5HY+QDwzaP3r0gsZmad5eRfBvMO26vSIZiZtcvJ38wshZz8zcxSyMm/RL7ga2bVyMm/ZM7+ZlZ9Skr+kj4h6UlJWyU1tpk3X9JKSSskHZVTfpCkx5N5F0huO5uZ9bRSW/5PACcAi3MLJU0BTgKmAnOBn0qqS2ZfBMwD9klec0uMwczMuqikxztExHKAPI3344CrI2Iz8LyklcAsSauAoRHxQLLe5cDxwC2lxFFJHZ23fOqQCYweMqBngjEz66RyPdtnHPBgzvvmpOydZLpteV6S5pE9S6C+vp5MJlNUMC0tLax65JGi1u3IE088wdpBO55A5cY5Z3hr2Utl2X8hLS0tRX9e1cp1rn1pqy+Ur84dJn9JdwB75Jl1TkQsKrRanrJopzyviFgILARobGyMpqam9oMtIJPJMGmfA+GB+4pavz0HTJvGnqMGwX3be76KjbM7ZTKZXhFHT3Kda1/a6gvlq3OHyT8i5hSx3WYg99vLG4DVSXlDnnIzM+tB5RrqeSNwkqT+kiaTvbD7UESsATZKOiQZ5XMKUOjsoSp4sJKZVaNSh3p+TFIzMBu4WdLvACLiSeAaYBlwK3BmRLybrHYGcAmwEniWKr7Ya2ZWrUod7XM9cH2BeQuABXnKHwGmlbJfMzMrTWru8I2Cl5VL404fM6tGqUn+5eIufzOrRk7+3aBcZxVmZuXi5G9mlkJO/iVyt4+ZVSMn/xLJl3zNrAqlJvmXq4V+QMMwt/7NrOqkJvl390XZfnXZjD9kQLmejWdmVj6pSf5mZradk3+R3NdvZtXMyb9IUfhJ1GZmvZ6Tf4mEGDt8YKXDMDPrEif/brDrLn1Zdf5HKh2GmVmnOfmbmaVQapJ/d/bRf/Po/bttW2ZmlZCa5N+dPrh/vR/mZmZVzcnfzCyFnPyL5Ec6mFk1c/I3M0shJ38zsxRKTfLv7scx+IKvmVWz1CT/UoZ6nnDguJ221sp9/2ZWjVKT/Eux1+jBlQ7BzKxb+WH0HWjYbefn9uR2+eRO3/yl99G87q0eiMrMrDRO/h1o2m9U3vJ83T1Txw5j6thhZY7IzKx07vbpQKELxb7ga2bVzMm/RL7ga2bVqKTkL+kTkp6UtFVSY075JElvSXo0eV2cM+8gSY9LWinpAql3p8/eHZ2ZWXFKbfk/AZwALM4z79mImJm8Ts8pvwiYB+yTvOaWGEOnFNtNE+EDgJnVnpKSf0Qsj4gVnV1e0hhgaEQ8EBEBXA4cX0oMleDufjOrduUc7TNZ0h+BDcA/RcS9wDigOWeZ5qQsL0nzyJ4lUF9fTyaTKSqQlpYWnlu6pKh1V69+ibde27Hp//BDDzNsF1i7Ce655x769ul9pwYtLS1Ff17VynWufWmrL5Svzh0mf0l3AHvkmXVORCwqsNoaYEJErJV0EHCDpKmQd+hMwYZ0RCwEFgI0NjZGU1NTR+HmlclkGL/XTHjg911ed9y4cdQPHQBPbz/BOXjWwfzm/X15eNVrzJlZ8NhVUZlMhmI/r2rlOte+tNUXylfnDpN/RMzp6kYjYjOwOZleIulZYF+yLf2GnEUbgNVd3X5vMHb4QI7rpYnfzKwjZRnqKWmUpLpkek+yF3afi4g1wEZJhySjfE4BCp09dHNMPbEXM7PqUOpQz49JagZmAzdL+l0y6zDgMUl/An4NnB4RryXzzgAuAVYCzwK3lBJDJfgGLzOrdiVd8I2I64Hr85RfC1xbYJ1HgGml7LcYxSZsnzCYWS3yHb5mZimUmuR/8+Nrum1bpXw3gJlZb5Ca5L9w8XOVDsHMrNdITfI3M7PtnPw7IGnbMNEJu+8KQMNuu1YwIjOz0vnLXLrg6APGcPaH9690GGZmJXPLvxM8rt/Mao2Tfxf4LmEzqxVO/mZmKeTkb2aWQk7+ZmYp5ORvZpZCTv5mZink5G9mlkJO/mZmKeTk3wW+2cvMaoWTfyf45i4zqzVO/p3gFr+Z1Ron/w7ktvp9BmBmtcLJ38wshZz8zcxSyMnfzCyFnPzNzFLIyb8Dwld5zaz2OPmbmaWQk38Hjps5ttIhmJl1Oyf/dhyy5+7MGD+80mGYmXW7kpK/pB9IekrSY5KulzQ8Z958SSslrZB0VE75QZIeT+ZdIPXeW6fa9vf7Tl8zqxWltvxvB6ZFxHTgaWA+gKQpwEnAVGAu8FNJdck6FwHzgH2S19wSYyi73nt4MjMrTknJPyJui4gtydsHgYZk+jjg6ojYHBHPAyuBWZLGAEMj4oGICOBy4PhSYiin1qTvFr+Z1Zq+3bitU4FfJtPjyB4MWjUnZe8k023L85I0j+xZAvX19WQymaICa2lpgSKGbK5bt45MJsPzz70NwIsvvEAm85eiYuhpLS0tRX9e1cp1rn1pqy+Ur84dJn9JdwB75Jl1TkQsSpY5B9gCXNG6Wp7lo53yvCJiIbAQoLGxMZqamjoKN6/sB/dGl9cbPnw3mpoO4clYCc+sYMLECTQ17V9UDD0tk8lQ7OdVrVzn2pe2+kL56txh8o+IOe3Nl/QZ4BjgiKQrB7It+vE5izUAq5PyhjzlvZL7+s2sVpU62mcu8A3goxHxZs6sG4GTJPWXNJnshd2HImINsFHSIckon1OARaXEYGZmXVdqn/+FQH/g9mTE5oMRcXpEPCnpGmAZ2e6gMyPi3WSdM4BLgYHALcnLzMx6UEnJPyL2bmfeAmBBnvJHgGml7NfMzErjO3zNzFLIyb8dvuBrZrXKyb8LfLOXmdUKJ/9O8BmAmdUaJ/9OcIvfzGqNk3872j7V02cAZlYrnPzNzFLIyb8dx84YU+kQzMzKwsm/gM8fvicnHjyh0mGYmZWFk38Bffu4g9/MapeTv5lZCjn5m5mlkJO/mVkKOfkX4Bu7zKyWOfl3gQ8IZlYrnPwL8N28ZlbLnPy7wAcEM6sVTv4FuIvHzGqZk7+ZWQo5+ZuZpZCTv5lZCjn5m5mlkJO/mVkKOfmbmaWQk7+ZWQo5+ZuZpZCTfwH57vHyjV9mVitKSv6SfiDpKUmPSbpe0vCkfJKktyQ9mrwuzlnnIEmPS1op6QLJD00wM+tppbb8bwemRcR04Glgfs68ZyNiZvI6Paf8ImAesE/ymltiDGWR74jkw5SZ1YqSkn9E3BYRW5K3DwIN7S0vaQwwNCIeiIgALgeOLyUGMzPruu7s8z8VuCXn/WRJf5R0j6T3J2XjgOacZZqTsl7H3ftmVsv6drSApDuAPfLMOiciFiXLnANsAa5I5q0BJkTEWkkHATdImkr+3pSCeVbSPLJdRNTX15PJZDoKN6+WlpYCuy7shT+/QCbzFwCef+7tncp6u5aWlqI/r2rlOte+tNUXylfnDpN/RMxpb76kzwDHAEckXTlExGZgczK9RNKzwL5kW/q5XUMNwOp29r0QWAjQ2NgYTU1NHYWbV/aDe6NL60yYOIGmpv0BeDJWwjMrdijr7TKZDMV+XtXKda59aasvlK/OpY72mQt8A/hoRLyZUz5KUl0yvSfZC7vPRcQaYKOkQ5JRPqcAi0qJwczMuq7Dln8HLgT6A7cnIzYfTEb2HAZ8W9IW4F3g9Ih4LVnnDOBSYCDZawS3tN1ob+CBPWZWy0pK/hGxd4Hya4FrC8x7BJhWyn57gi/4mlkt8x2+ZmYp5ORvZpZCTv6dsF/9EAD+aszQCkdiZtY9Sr3gW7NyH+I2Z0o9d3z1MPYePaRyAZmZdSO3/DvJid/MaomTfwF+iJuZ1TInfzOzFHLyNzNLISf/AvytXWZWy5z8zcxSyMnfzCyFnPwL8GgfM6tlTv4FuM/fzGqZk7+ZWQo5+ZuZpZCTv5lZCqUi+f/x5S1dXif8dS5mVsNSkfx/vHRzpUMwM+tVUpH8iyF/i6+Z1TAnfzOzFHLyNzNLISf/AnzB18xqmZO/mVkKOfmbmaWQk38BHu1jZrXMyb8A9/mbWS1z8jczSyEnfzOzFCop+Uv6jqTHJD0q6TZJY3PmzZe0UtIKSUfllB8k6fFk3gWSvzbFzKynldry/0FETI+ImcBNwLkAkqYAJwFTgbnATyXVJetcBMwD9klec0uMwczMuqik5B8RG3LeDoJtV0mPA66OiM0R8TywEpglaQwwNCIeiIgALgeOLyWGcvnseydXOgQzs7LpW+oGJC0ATgFeBz6QFI8DHsxZrDkpeyeZblteaNvzyJ4lALRIWlFkmCOBV7uywpjvF7mn3qPLda4BrnPtS1t9ofQ6T8xX2GHyl3QHsEeeWedExKKIOAc4R9J84AvAP0PeQfLRTnleEbEQWNhRjB2R9EhENJa6nWriOqdD2uqctvpC+ercYfKPiDmd3NaVwM1kk38zMD5nXgOwOilvyFNuZmY9qNTRPvvkvP0o8FQyfSNwkqT+kiaTvbD7UESsATZKOiQZ5XMKsKiUGMzMrOtK7fM/X9J+wFbgz8DpABHxpKRrgGXAFuDMiHg3WecM4FJgIHBL8iq3kruOqpDrnA5pq3Pa6gtlqrOyg27MzCxNfIevmVkKOfmbmaVQTSd/SXOTx0uslHR2peMphaSfS3pZ0hM5ZbtLul3SM8nP3XLmVf3jNSSNl3S3pOWSnpT05aS8ZustaYCkhyT9Kanzt5Lymq0zgKQ6SX+UdFPyvtbruyqJ9VFJjyRlPVvniKjJF1AHPAvsCewC/AmYUum4SqjPYcB7gCdyyv4VODuZPhv4fjI9Jalvf2By8jnUJfMeAmaTvefiFuDDla5bO3UeA7wnmR4CPJ3UrWbrncQ3OJnuB/wBOKSW65zE+lWyw8VvSsnf9ipgZJuyHq1zLbf8ZwErI+K5iHgbuJrsYyeqUkQsBl5rU3wccFkyfRnbH5VR9Y/XAIiINRGxNJneCCwne0d4zdY7slqSt/2SV1DDdZbUAHwEuCSnuGbr244erXMtJ/9xwIs579t9lESVqo/svRMkP0cn5YXqPo4uPF6jN5E0CTiQbEu4puuddIE8CrwM3B4RtV7n/wC+TnbIeKtari9kD+i3SVqSPMYGerjOJT/bpxfr0qMkaky3PF6jt5A0GLgW+EpEbGinW7Mm6h3Ze2JmShoOXC9pWjuLV3WdJR0DvBwRSyQ1dWaVPGVVU98ch0bEakmjgdslPdXOsmWpcy23/As9YqKW/G9y6kfy8+WkvGYeryGpH9nEf0VEXJcU13y9ASJiPZAh+9jzWq3zocBHJa0i2zX7QUn/Q+3WF4CIWJ38fBm4nmw3dY/WuZaT/8PAPpImS9qF7PcL3FjhmLrbjcBnkunPsP1RGTXxeI0kxv8GlkfEv+fMqtl6SxqVtPiRNBCYQ/axKTVZ54iYHxENETGJ7P/oXRHxKWq0vgCSBkka0joNHAk8QU/XudJXvcv5Ao4mO0LkWbJPIa14TCXU5SpgDdsfi/05YARwJ/BM8nP3nOXPSeq9gpwRAEBj8of2LHAhyV3evfEFvI/saexjwKPJ6+harjcwHfhjUucngHOT8pqtc068TWwf7VOz9SU7AvFPyevJ1tzU03X24x3MzFKolrt9zMysACd/M7MUcvI3M0shJ38zsxRy8jczSyEnfzOzFHLyNzNLof8Pn0zHOU6Ji5sAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# create taxi environment\n",
    "env = gym.make(\"Taxi-v3\")\n",
    "\n",
    "# create a SARSA agent\n",
    "agent = SARSAAgent(\n",
    "            alpha=0.25, \n",
    "            epsilon=0.2, \n",
    "            gamma=0.99, \n",
    "            get_possible_actions=lambda s : range(env.nA)\n",
    "        )\n",
    "\n",
    "#train agent and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt = 5000)\n",
    "\n",
    "#plot reward graph\n",
    "plot_rewards(\"Taxi\", rewards, 'SARSA')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "We see that SARSA agent learns the optimal policy by end of 500 episodes of training. The policy learnt is to avoid the cliff by first going all the way up and then taking a right turn to walk towards the goal. This is surprising as we would have expected the agent to learn the policy to skirt over the cliff and reach goal which would have been the shortest path, a path shorter by 4 steps as compared to the one learnt by agent. \n",
    "\n",
    "However, as our policy continues to explore using ε-greedy, there is always a small chance that agent when next to a cliff cell, takes a random action and falls into the cliff. It demonstrates the issue of continued exploration even when enough has been learnt about the environment i.e. when same ε-greedy policy is used for sampling as well for improvement. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
